Practical: early outbreak assessment, transmissibility and forecasting
Introduction
On the tidyverse
The tidyverse is a collection of R packages designed for data science. Developed with high standards of coding practices and software development, these packages form a consistent ecosystem to handle and visualise data. In this practical, we will mostly rely on the following packages:
dplyr for handling data, making new variables, building summaries, number-crunching
tidyr to re-arrange tidy/data
ggplot2 to visualise data
magrittr for the piping operator (
%>%)
Most of these functionalities are summarised in handy cheatsheets. We provide links to the most relevant ones below:
basics of R (not tidyverse, but they remain very useful)
Other packages
Other packages we will use include:
Cheatsheets and other resources:
the rmarkdown cheatsheet
the knitr website documenting many options used in rmarkdown’s settings and code chunks
On the RECONverse
Several RECON packages will be used in the practical:
linelist for data cleaning
incidence older package to build and visualise epidemic curves, and do basic model fitting; it is being replaced by incidence2 but some older packages like earlyR (for estimating the reproduction number) and projections (for short term forecasting) are still using it.
incidence2 re-implementation of incidence, to build and handle epidemic curves
i2extras provides simplified wrappers for estimating growth rates using different GLMs from incidence2 objects
epicontacts to visualise and analyse contact tracing data or transmission chains
epitrix to estimate delay distributions
distcrete to handle delay distributions
earlyR for early-stage estimation of the reproduction number
projections for short-term forecasting
Note that earlyR and projections will soon be adapted to handle incidence2 objects.
For some of these packages, we recommend using the github (development) version, either because it is more up-to-date, or because they have not been released on CRAN yet.
## instructions for github packages (latest versions)
remotes::install_github("reconhub/linelist")
remotes::install_github("reconhub/earlyR")
remotes::install_github("reconhub/i2extras")
#remotes::install_github("reconhub/projections")What we will do today
This practical will use different outbreak data to illustrate how the various packages mentioned above can be combined to explore outbreak data and carry out analyses specific to the outbreak response context. Building on material seen in previous sessions, we will particularly look at:
- cleaning / standardising ‘dirty’ data
- building epidemic curves
- derive empirical distributions of key delays
- fit delays to discretized Gamma distributions using Maximum Likelihood, and summarising the results
- visualise and analyse transmission chains / contact tracing data
- estimate the growth rate and doubling time from epidemic curves
- estimate the reproduction number from epidemic curvse
- perform short-term forecasting of case incidence
In addition, we strongly recommend storing all analyses into one or several rmarkdown reports.
How this document works
This practical will take you through initial exploratory analyses of different datasets. Some parts will be used merely to illustrate a technical point (e.g. how to count cases by different groups), some will focus more on the epidemiological meaning of a result, and some will do both. These analyses are merely meant as a guide, and are by no means an exhaustive exploration of the data. Be curious: feel free to ask new questions and try to answer them. In all instances, try to first answer your own questions: try/error is a great way to learn R. But if you feel stuck, do not hesitate to questions to the demonstrators: they are here to help.
In this practical, most command lines are provided, but not all. In the few instances where code is not provided, you will typically need to re-use and adapt previous code to produce the result requested. In most cases, copy-paste should be sufficient to reproduce the results.
However, it is important that you understand what the code does. Look at relevant documentation, try tweaking the code, experiment, and do not hesitate to ask questions. It is not essential to go through all examples to illustrate important points, so take your time, especially if you are new to R. Also note that the solutions provided are by not means the only options for solving a given problem - feel free to explore alternatives and discuss them with the trainers.
A novel EVD outbreak in Ankh, Republic of Morporkia
This practical simulates the early assessment of an Ebola Virus Disease (EVD) outbreak. It introduces various aspects of data analysis of the early stage of an outbreak, including contact tracing data, characterisation of the serial interval, and estimation of the growth rate, doubling time and reproduction number from epidemic curves.
A new EVD outbreak has been notified in the small city of Ankh, located in the Northern, rural district of the Republic of Morporkia. Public Health Morporkia (PHM) is in charge of coordinating the outbreak response, and have contracted you as a consultant in outbreak analytics to inform the response in real time.
Importing data
While a new data update is pending, you have been given the linelist and contact data describing the early stages of the outbreak as xlsx files:
phm_evd_linelist_2017-10-27.xlsx: a linelist containing case information up to the 27th October 2017
phm_evd_contacts_2017-10-27.xlsx: a list of contacts reported between cases up to the 27th October 2017, where
fromindicates a potential source of infection, andtothe recipient of the contact.
To import these files:
create a
data/folder in your project directorydownload and save these files in
data/import the files using
here::hereto find the path to the data, andrio::importto read the files and import them into R
library(tidyverse)
linelist_raw <- here::here("data", "phm_evd_linelist_2017-10-27.xlsx") %>%
rio::import()
contacts_raw <- here::here("data", "phm_evd_contacts_2017-10-27.xlsx") %>%
rio::import()Data cleaning
Once imported, the raw data should look like:
## linelist: one line per case
linelist_raw
## id Date of Onset Date.Report. SEX. Âge location
## 1 39e9dc 43018 43030 Female 62 Pseudopolis
## 2 664549 43024 43032 Male 28 peudopolis
## 3 B4D8AA 43025 “23/10/2017” male 54 Ankh-Morpork
## 4 51883d “18// 10/2017” 22-10-2017 male 57 PSEUDOPOLIS
## 5 947e40 43028 “2017/10/25” f 23 Ankh Morpork
## 6 9aa197 43028 “2017-10-23” f 66 AM
## 7 e4b0a2 43029 “2017-10-24” female 13 Ankh Morpork
## 8 AF0AC0 “2017-10-21” “26-10-2017” M 10 ankh morpork
## 9 185911 “2017-10-21” “26-10-2017” female 34 AM
## 10 601D2E “2017/10/22” “28-10-2017” <NA> 11 AM
## 11 605322 43030 “28 / 10 / 2017” FEMALE 23 Ankh Morpork
## 12 E399B1 23 / 10 / 2017 “28/10/2017” female 23 Ankh Morpork
## contacts: pairs of cases with reported contacts
contacts_raw
## Source Case #ID case.id
## 1 51883d 185911
## 2 b4d8aa e4b0a2
## 3 39e9dc b4d8aa
## 4 39E9DC 601d2e
## 5 51883d 9AA197
## 6 39e9dc 51883d
## 7 39E9DC e399b1
## 8 b4d8aa AF0AC0
## 9 39E9DC 947e40
## 10 39E9DC 664549
## 11 39e9dc 605322Examine these data and try to list existing issues.
What is wrong with the raw data?
The raw data, whilst very small in size, have a number of issues, including:
capitalisation: mixture of upper case and lower cases
dates: their format is very heterogeneous, so that dates are not readily useable
special characters: accents, various separators and trailing / heading white spaces
typos / inconsistent coding: gender is sometimes indicated in full words, sometimes abbreviated; location has some obvious abbreviations and typos
We will use the function clean_data() from the linelist package to clean these data. This function will:
- set all characters to lower case
- use
_as universal separator - remove all heading and trailing separators
- replace all accentuated characters by their closest ASCII match
- detect date formats and convert entries to
Datewhen relevant (see argumentguess_dates) - replace typos and recode variables (see argument
wordlists)
For this last part, we need to load a separate file containing cleaning rules: phm_evd_cleaning_rules.xlsx.
Examine this file (and read the ‘explanations’ tab), import it as the other files, and save the resulting data.frame as and object called cleaning_rules. The output should look like:
## bad_spelling good_spelling variable
## 1 f female sex
## 2 m male sex
## 3 .missing unknown sex
## 4 am ankh_morpork location
## 5 peudopolis pseudopolis location
## 6 .missing unknown location
We can now clean our data using clean_data; execute and interprete the following commands:
library(linelist)
## clean linelist
linelist <- linelist_raw %>%
clean_data(wordlist = cleaning_rules) %>%
mutate(across(contains("date"), guess_dates))
linelist
## id date_of_onset date_report sex age location
## 1 39e9dc 2017-10-10 2017-10-22 female 62 pseudopolis
## 2 664549 2017-10-16 2017-10-24 male 28 pseudopolis
## 3 b4d8aa 2017-10-17 2017-10-23 male 54 ankh_morpork
## 4 51883d 2017-10-18 2017-10-22 male 57 pseudopolis
## 5 947e40 2017-10-20 2017-10-25 female 23 ankh_morpork
## 6 9aa197 2017-10-20 2017-10-23 female 66 ankh_morpork
## 7 e4b0a2 2017-10-21 2017-10-24 female 13 ankh_morpork
## 8 af0ac0 2017-10-21 2017-10-26 male 10 ankh_morpork
## 9 185911 2017-10-21 2017-10-26 female 34 ankh_morpork
## 10 601d2e 2017-10-22 2017-10-28 unknown 11 ankh_morpork
## 11 605322 2017-10-22 2017-10-28 female 23 ankh_morpork
## 12 e399b1 2017-10-23 2017-10-28 female 23 ankh_morpork
## clean contacts
contacts <- clean_data(contacts_raw)
contacts
## source_case_id case_id
## 1 51883d 185911
## 2 b4d8aa e4b0a2
## 3 39e9dc b4d8aa
## 4 39e9dc 601d2e
## 5 51883d 9aa197
## 6 39e9dc 51883d
## 7 39e9dc e399b1
## 8 b4d8aa af0ac0
## 9 39e9dc 947e40
## 10 39e9dc 664549
## 11 39e9dc 605322Looking at transmission chains
Exercise: contact tracing is at the centre of an Ebola outbreak response. Using the same approach seen in the previous practical, build an epicontacts object using the function make_epicontacts, and store the results as x.
##
## /// Epidemiological Contacts //
##
## // class: epicontacts
## // 12 cases in linelist; 11 contacts; directed
##
## // linelist
##
## # A tibble: 12 × 6
## id date_of_onset date_report sex age location
## <chr> <date> <date> <chr> <dbl> <chr>
## 1 39e9dc 2017-10-10 2017-10-22 female 62 pseudopolis
## 2 664549 2017-10-16 2017-10-24 male 28 pseudopolis
## 3 b4d8aa 2017-10-17 2017-10-23 male 54 ankh_morpork
## 4 51883d 2017-10-18 2017-10-22 male 57 pseudopolis
## 5 947e40 2017-10-20 2017-10-25 female 23 ankh_morpork
## 6 9aa197 2017-10-20 2017-10-23 female 66 ankh_morpork
## 7 e4b0a2 2017-10-21 2017-10-24 female 13 ankh_morpork
## 8 af0ac0 2017-10-21 2017-10-26 male 10 ankh_morpork
## 9 185911 2017-10-21 2017-10-26 female 34 ankh_morpork
## 10 601d2e 2017-10-22 2017-10-28 unknown 11 ankh_morpork
## 11 605322 2017-10-22 2017-10-28 female 23 ankh_morpork
## 12 e399b1 2017-10-23 2017-10-28 female 23 ankh_morpork
##
## // contacts
##
## # A tibble: 11 × 2
## from to
## <chr> <chr>
## 1 51883d 185911
## 2 b4d8aa e4b0a2
## 3 39e9dc b4d8aa
## 4 39e9dc 601d2e
## 5 51883d 9aa197
## 6 39e9dc 51883d
## 7 39e9dc e399b1
## 8 b4d8aa af0ac0
## 9 39e9dc 947e40
## 10 39e9dc 664549
## 11 39e9dc 605322
You can easily plot these contacts, but with a little bit of tweaking (see ?vis_epicontacts) you can customise shapes by gender:
plot(x, node_shape = "sex",
shapes = c(male = "male",
female = "female",
unknown = "question-circle"),
selector = FALSE)Question: What can you say about these contacts? Are there evidences of super-spreading? Can you estimate it at this stage?
Looking at incidence curves
The first question PHM asks you is simply: how bad is it?. Given that this is a terrible disease, with a mortality rate nearing 70%, there is a lot of concern about this outbreak getting out of control. The first step of the analysis lies in drawing an epicurve, i.e. an plot of incidence over time.
In principle, the new package incidence2 should be chosen to build epidemic curves. However, here we shall use its older brother incidence, which provides some facilities for estimating growth rates which will later be used.
We compute daily incidence based on the dates of symptom onset, storing results in an object called i:
i <- incidence::incidence(linelist$date_of_onset)
i
## <incidence object>
## [12 cases from days 2017-10-10 to 2017-10-23]
##
## $counts: matrix with 14 rows and 1 columns
## $n: 12 cases in total
## $dates: 14 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 14 days
## $cumulative: FALSE
large_txt <- theme(text = element_text(size = 16),
axis.text = element_text(size = 14))
rotate_x_txt <- theme(axis.text.x = element_text(angle = 45,
hjust = 1))
plot(i, show_cases = TRUE) +
theme_bw() +
large_txt +
rotate_x_txt +
labs(title = "Epidemic curve")Exercise: If you pay close attention to the dates on the x-axis, you may notice that something is missing. Indeed, the graph stops right after the last case, but field epidemiologists from PHM insist that the data should be complete until the 27th October 2017. Try to fix this using the argument last_date in the incidence function.
## <incidence object>
## [12 cases from days 2017-10-10 to 2017-10-27]
##
## $counts: matrix with 18 rows and 1 columns
## $n: 12 cases in total
## $dates: 18 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 18 days
## $cumulative: FALSE
Estimating the growth rate (r)
The simplest model of incidence is probably the log-linear model, i.e. a linear regression on log-transformed incidences. In the incidence package, the function fit will estimate the parameters of this model from an incidence object (here, i), and derive estimate of doubling / halving times and associated confidence intervals:
f <- i %>%
incidence::fit()
f
## <incidence_fit object>
##
## $model: regression of log-incidence over time
##
## $info: list containing the following items:
## $r (daily growth rate):
## [1] 0.05352107
##
## $r.conf (confidence interval):
## 2.5 % 97.5 %
## [1,] -0.0390633 0.1461054
##
## $doubling (doubling time in days):
## [1] 12.95092
##
## $doubling.conf (confidence interval):
## 2.5 % 97.5 %
## [1,] 4.744158 -17.74421
##
## $pred: data.frame of incidence predictions (8 rows, 5 columns)
plot(i, show_cases = TRUE, fit = f) +
theme_bw() +
large_txt +
rotate_x_txt +
labs(title = "Epidemic curve and log-linear fit")Question: How would you interpret this result? What criticisms would you make of this model?
Estimating the reproduction number (R)
Branching process models can be used to estimate the reproduction number (R), given incidence data and a serial interval distribution. As you have already built an incidence object, the only missing information relates to the serial interval distribution. As current data are insufficient to estimate it, you can use previously published estimates with a mean of 15.3 days and a standard deviation 9.3 days (WHO Ebola Response Team 2014).
Exercise: Look at the documentation of the function get_R from the earlyR package, and identify the arguments corresponding to the incidence data, the mean and standard deviations of the serial interval. Run the function with the correct arguments, and store the output as a new object called R; results should resemble:
You can visualise the results as follows:
R
##
## /// Early estimate of reproduction number (R) //
## // class: earlyR, list
##
## // Maximum-Likelihood estimate of R ($R_ml):
## [1] 3.023023
##
##
## // $lambda:
## NA 0.01838179 0.0273192 0.03514719 0.0414835 0.04623398...
##
## // $dates:
## [1] "2017-10-10" "2017-10-11" "2017-10-12" "2017-10-13" "2017-10-14"
## [6] "2017-10-15"
## ...
##
## // $si (serial interval):
## A discrete distribution
## name: gamma
## parameters:
## shape: 2.70655567117586
## scale: 5.65294117647059
plot(R) +
geom_vline(xintercept = 1, color = "#ac3973") +
theme_bw() +
large_txtThe first figure shows the distribution of likely values of R, and the Maximum-Likelihood (ML) estimation. The second figure shows the global force of infection over time, with dashed bars indicating dates of onset of the cases.
Question: Interpret these results: what do you make of the reproduction number? What does it reflect? Based on the last part of the epicurve, some colleagues suggest that incidence is going down and the outbreak may be under control. What is your opinion on this?
Short-term forecasting
The function project from the package projections can be used to simulate plausible epidemic trajectories by simulating daily incidence using the same branching process as the one used to estimate \(R_0\) in earlyR. All that is needed is one or several values of \(R_0\) and a serial interval distribution, stored as a distcrete object. As this one is included in the output of earlyR, we have all the necessary inputs to carry out these simulations.
Here, we illustrate how we can simulate 5 random trajectories using a fixed value of \(R_0\) = 3.02, the ML estimate of \(R_0\):
library(projections)
project(i, R = R$R_ml, si = R$si, n_sim = 5, n_days = 10, R_fix_within = TRUE)
##
## /// Incidence projections //
##
## // class: projections, matrix, array
## // 10 dates (rows); 5 simulations (columns)
##
## // first rows/columns:
## [,1] [,2] [,3] [,4] [,5]
## 2017-10-28 3 5 6 4 3
## 2017-10-29 4 3 3 1 2
## 2017-10-30 1 3 1 2 0
## 2017-10-31 3 7 3 1 4
## .
## .
## .
##
## // dates:
## [1] "2017-10-28" "2017-10-29" "2017-10-30" "2017-10-31" "2017-11-01"
## [6] "2017-11-02" "2017-11-03" "2017-11-04" "2017-11-05" "2017-11-06"Exercise: Read the commands below and interpret the results.
proj <- project(i, R = sample_R(R, 5000), si = R$si,
n_sim = 5000, n_days = 14,
R_fix_within = TRUE)
plot(i) %>%
add_projections(proj, c(.1, .5, .9)) +
theme_bw() +
scale_x_date() +
large_txt +
rotate_x_txt +
labs(title = "Short term forcasting of new cases")
apply(proj, 1, summary)
## 2017-10-28 2017-10-29 2017-10-30 2017-10-31 2017-11-01 2017-11-02
## Min. 0.0000 0.000 0.0000 0.0000 0.0000 0.0000
## 1st Qu. 1.0000 1.000 1.0000 1.0000 1.0000 1.0000
## Median 2.0000 2.000 2.0000 2.0000 2.0000 3.0000
## Mean 2.0018 2.111 2.2122 2.4542 2.6718 2.9518
## 3rd Qu. 3.0000 3.000 3.0000 4.0000 4.0000 4.0000
## Max. 10.0000 12.000 11.0000 13.0000 13.0000 16.0000
## 2017-11-03 2017-11-04 2017-11-05 2017-11-06 2017-11-07 2017-11-08
## Min. 0.0000 0.0000 0.0000 0.0000 0.0000 0.000
## 1st Qu. 2.0000 2.0000 2.0000 2.0000 2.0000 3.000
## Median 3.0000 3.0000 4.0000 4.0000 4.0000 5.000
## Mean 3.3486 3.7316 4.1852 4.7128 5.3866 6.087
## 3rd Qu. 5.0000 5.0000 6.0000 6.0000 7.0000 8.000
## Max. 18.0000 24.0000 32.0000 33.0000 40.0000 43.000
## 2017-11-09 2017-11-10
## Min. 0.000 0.0000
## 1st Qu. 3.000 3.0000
## Median 5.000 6.0000
## Mean 6.865 7.7546
## 3rd Qu. 9.000 10.0000
## Max. 62.000 78.0000
apply(proj, 1, function(x) mean(x > 0))
## 2017-10-28 2017-10-29 2017-10-30 2017-10-31 2017-11-01 2017-11-02 2017-11-03
## 0.8316 0.8492 0.8606 0.8826 0.8924 0.9034 0.9252
## 2017-11-04 2017-11-05 2017-11-06 2017-11-07 2017-11-08 2017-11-09 2017-11-10
## 0.9266 0.9446 0.9484 0.9570 0.9628 0.9628 0.9690Question: According to these results, what are the chances that more cases will appear in the near future? Is this outbreak being brought under control? Would you recommend scaling up / down the response?
COVID-19 in the UK
For the second part of this practical, we look at real data reporting COVID-19 hospitalisation in England, broken down to the hospital and National Health Service (NHS) region level. The data is available online from the NHS England’s website. The dataset analysed here is a simplified version, providing incidence of hospital admissions by NHS trust, stored in the file covid_admissions_uk_2020_10_24.xlsx.
Importing the data
Download this dataset, save it in the data/ folder, and then import the data by typing:
path_to_file <- here::here("data", "covid_admissions_uk_2020_10_24.xlsx")
covid_eng <- path_to_file %>%
rio::import() %>%
tibble() %>%
mutate(date = as.Date(date)) # date-time -> date
covid_eng
## # A tibble: 31,614 × 5
## date region org_name org_code n
## <date> <chr> <chr> <chr> <dbl>
## 1 2020-03-20 East of England Southend University Hospital NHS F… RAJ 0
## 2 2020-03-20 East of England Bedford Hospital NHS Trust RC1 2
## 3 2020-03-20 East of England Luton and Dunstable University Hos… RC9 0
## 4 2020-03-20 East of England The Queen Elizabeth Hospital, King… RCX 1
## 5 2020-03-20 East of England Milton Keynes University Hospital … RD8 0
## 6 2020-03-20 East of England Basildon and Thurrock University H… RDD 0
## 7 2020-03-20 East of England East Suffolk and North Essex NHS F… RDE 0
## 8 2020-03-20 East of England Royal Papworth Hospital NHS Founda… RGM 0
## 9 2020-03-20 East of England North West Anglia NHS Foundation T… RGN 0
## 10 2020-03-20 East of England James Paget University Hospitals N… RGP 0
## # … with 31,604 more rowsThe dataset includes:
date: the date of admissionregion: the NHS regionorg_name: the full name of the NHS trustorg_code: a short code for the NHS trustn: number of new, confirmed COVID-19 cases admitted, including inpatients who tested positive on that day, and new admissions with a positive test
Distribution of cases by NHS region
Here, we try to get an overall picture of the distribution of hospital admissions across England. The following code sums the number of cases reported by region:
covid_eng %>%
group_by(region) %>%
summarise(cases = sum(n))
## # A tibble: 7 × 2
## region cases
## <chr> <dbl>
## 1 East of England 10699
## 2 London 25689
## 3 Midlands 24199
## 4 North East and Yorkshire 21556
## 5 North West 25041
## 6 South East 14335
## 7 South West 6287Exercise: Adapt this code to derive total number of cases by region and NHS trust, and provide a visual representation of the resulting numbers and interpret the results; your plot may resemble:
Epidemic curves
Unlike linelists of cases, this dataset contains pre-calculated counts by date, NHS region and trust. While incidence2 is primarily designed to handle linelist-like data, it can also convert pre-computed incidences using the count argument:
library(incidence2)
# build the incidence2 object
i_covid <- covid_eng %>%
incidence2::incidence(date,
interval = "week",
count = n,
groups = region)
# summary
i_covid %>%
summary()
## date range: [2020-W12] to [2020-W43]
## n: 127806
## interval: 1 (Monday) week
## cumulative: FALSE
## timespan: 224 days
##
## 1 grouped variable
##
## region n
## <chr> <dbl>
## 1 East of England 10699
## 2 London 25689
## 3 Midlands 24199
## 4 North East and Yorkshire 21556
## 5 North West 25041
## 6 South East 14335
## 7 South West 6287
# plot with regions as colors
i_covid %>%
plot(fill = region, col_pal = muted, angle = 45, legend = "bottom") +
labs(title = "Weekly incidence of COVID-19 hospital admissions in England")Exercise: Use facet_plot, tweaking the arguments angle, format and the one specifying the number of dates on the x-axis, to reproduce the plot below; which do you think is best at representing dynamics of COVID-19 in England?
Growth rates
The new package i2extras adds a number of tools for incidence2 objects, including the implementation of different GLMs to fit epicurves, and derive growth rates.
To illustrate these approaches, we first select the data so that:
- we use daily incidences by region
- we keep the last 4 weeks of data
- we exclude the last week of data, as it may be prone to biases due to reporting delays
last_date <- covid_eng %>%
pull(date) %>%
max()
last_date
## [1] "2020-10-24"
# version using keep_first and keep_last from i2extras
i_recent <- covid_eng %>%
incidence2::incidence(date,
count = n,
groups = region) %>%
keep_last(4 * 7) %>% # keep last 4 weeks of data
keep_first(3 * 7) # remove last week of data as incomplete
## # 'old' version using 'filter'
## i_recent <- covid_eng %>%
## incidence2::incidence(date,
## count = n,
## groups = region) %>%
## filter(date_index > (last_date - 28), # keep last 4 weeks of data
## date_index <= (last_date - 7)) # remove last week of data
i_recent %>%
summary()
## date range: [2020-09-27] to [2020-10-17]
## n: 9788
## interval: 1 day
## cumulative: FALSE
## timespan: 21 days
##
## 1 grouped variable
##
## region n
## <chr> <dbl>
## 1 East of England 405
## 2 London 931
## 3 Midlands 1688
## 4 North East and Yorkshire 2617
## 5 North West 3224
## 6 South East 534
## 7 South West 389
i_recent %>%
facet_plot()Exercise: We can now fit models to these data. To do so, try doing the following:
load the package i2extras; if not installed, follow instructions from the package’s website
use the function
fit_curve()to fit a negative binomial model to the incidence data by region, save the output as a new objectlast_trends, and plot itfeed this output to the function
growth_rate()to obtain estimates of growth rates, doubling times and their confidence intervals, for each region; the results should resemble:
## # A tibble: 7 × 9
## count_variable region data model estimates fitting_warning fitting_error
## <chr> <chr> <list<t> <lis> <list> <list> <list>
## 1 n East of… [21 × 2] <glm> <df [21 … <NULL> <NULL>
## 2 n London [21 × 2] <glm> <df [21 … <NULL> <NULL>
## 3 n Midlands [21 × 2] <glm> <df [21 … <NULL> <NULL>
## 4 n North E… [21 × 2] <glm> <df [21 … <NULL> <NULL>
## 5 n North W… [21 × 2] <glm> <df [21 … <NULL> <NULL>
## 6 n South E… [21 × 2] <glm> <df [21 … <NULL> <NULL>
## 7 n South W… [21 × 2] <glm> <df [21 … <NULL> <NULL>
## # … with 2 more variables: prediction_warning <list>, prediction_error <list>
## # A tibble: 7 × 10
## count_variable region model r r_lower r_upper growth_or_decay time
## <chr> <chr> <lis> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 n East of Eng… <glm> 0.0495 0.0331 0.0661 doubling 14.0
## 2 n London <glm> 0.0504 0.0395 0.0613 doubling 13.8
## 3 n Midlands <glm> 0.0556 0.0475 0.0638 doubling 12.5
## 4 n North East … <glm> 0.0491 0.0426 0.0556 doubling 14.1
## 5 n North West <glm> 0.0596 0.0537 0.0655 doubling 11.6
## 6 n South East <glm> 0.0721 0.0574 0.0871 doubling 9.61
## 7 n South West <glm> 0.0948 0.0769 0.113 doubling 7.31
## # … with 2 more variables: time_lower <dbl>, time_upper <dbl>
The code below can be used to visualise the results:
last_trends %>%
growth_rate() %>%
ggplot(aes(y = region, x = r)) +
geom_point() +
geom_errorbar(aes(xmin = r_lower, xmax = r_upper)) +
geom_vline(xintercept = 0, linetype = 2) +
theme_bw() +
labs(title = "Estimates of daily growth rates of COVID-19 in England",
subtitle = sprintf(
"based on hospital admissions, %s - %s",
format(min(get_dates(i_recent)), "%d %B %Y"),
format(max(get_dates(i_recent)), "%d %B %Y")),
y = "",
x = "Daily growth rate")Exercise: repeat the same figure with doubling times. Taken together, what do these results suggest about the evolution of COVID-19 in the UK over the next weeks? What would be the doubling time for growth rates (r) close to 0? When would you recommend using doubling times for communicating results on the evolution of an epidemic?
References
WHO Ebola Response Team. 2014. “Ebola Virus Disease in West Africa–the First 9 Months of the Epidemic and Forward Projections.” N. Engl. J. Med. 371 (16): 1481–95.